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Abstract 



Single molecule force spectroscopy reveals unfolding of domains in titin 
upon stretching. We provide a theoretical framework for these experiments by 
computing the phase diagrams for force-induced unfolding of single domain 
proteins using lattice models. The results show that two-state folders (at zero 
force) unravel cooperatively whereas stretching of non-two-state folders oc- 
curs through intermediates. The stretching rates of individual molecules show 
great variations reflecting the heterogeneity of force-induced unfolding path- 
ways. The approach to the stretched state occurs in a step-wise "quantized" 
manner. Unfolding dynamics depends sensitively on topology. The unfolding 
rates increase exponentially with force / till an optimum value which is deter- 
mined by the barrier to unfolding when / = 0. A mapping of these results to 
proteins shows qualitative agreement with force-induced unfolding of Ig-like 
domains in titin. We show that single molecule force spectroscopy can be 
used to map the folding free energy landscape of proteins in the absence of 
denaturants. 



1 



I. INTRODUCTION 



Titin, a giant protein molecule responsible for elasticity of muscles, is comprised of a few 
hundred immunoglobulin (Ig) and fibronectin-III repeats aligned in tandem Recently, 
through nanomanipulation of single protein molecules, there has been direct evidence for 
sequential unfolding of individual domains upon stretching 0-^]. These remarkable experi- 
ments and others on DNA have made it possible to unearth the microscopic underpinnings 
of the unusual elastic behavior in biological molecules. In two experiments |||| individual 
titin molecules were tethered to a plastic bead and optical tweezers were used to stretch the 
molecule. Direct measurement of the forces required to stretch titin were used to infer that 
tension leads to unfolding of individual Ig-like domains Perhaps, the clearest evidence 
for domain unraveling was presented by Rief et al. || who used atomic force microscopy 
(AFM) to pull on titin molecules adsorbed onto a gold surface. The AFM experiments, on 
both the model recombinant titin molecules consisting only of Ig (Ig4 and Igg) domains and 
the native titin, showed clear saw-tooth patterns in the force-extension curves indicating se- 
quential unfolding of domains. The constant periodicity of the saw-tooth pattern (f» 25nm) 
is nearly coincident with the dimensions of the fully unfolded Ig domain 29nm) and 
is very similar to the contour length inferred from fitting the force-extension curves ob- 
tained from the optical tweezer experiments [|],|5| (see also related experiments on tenascin 
||). All the experiments conclude that sequential unraveling of the domains results upon 
mechanically stretching titin. 

Inspired by these experiments, we report the results of force-induced unfolding of single 
domain proteins using simple lattice models which have been useful in the search for general 
principles of protein folding ||. Because the primary mechanism of stretching titin involves 
unraveling of individual Ig-like domains, which fold spontaneously in the absence of tension 
TD|, TTH , our calculations provide microscopic origins of force induced unfolding. We show 



that the response of proteins to force depends primarily on their topology in the absence of 
force. By computing the phase diagram and kinetics of a number of model proteins subject 
to tension we show that the folding free energy landscape P,|T2"|| in the absence of force can 
be deciphered using single molecule manipulation techniques. 



II. MATERIALS AND METHODS 

The polypeptide chain is modeled as a sequence of N connected beads on a cubic lattice. 
The energy of a conformation (given by the vectors fi with i = 1, 2, 3, N) is 

N-3 N 

E v = E E B^Qn-f^-a) (1) 

i=l j=i+3 

where S(x) is the Kronecker delta function, a is the lattice spacing, and Bij is the contact 
energy between beads i and j. The set of matrix elements B^ specifies a sequence. We 
use two types of contact potentials, the statistical potentials derived by Kolinski, Godzik, 
and Skolnick (KGS) |13[ and the random bond (RB) model ||14|| . The applied force to the 
terminal beads yields an additional energy 

E f = ~fz, (2) 



2 



where z — \?x — rjv| is the extension. Since the polypeptide chain is on lattice, where contin- 
uous overall rotations are not possible, we assume that upon stretching there is alignment of 
the protein along the force direction with zero torque. This is equivalent to the assumption 
that the relaxation time for the overall rotational degrees of freedom is much shorter than 
that for structural relaxation that is responsible for unfolding or folding processes. Thus we 
take the absolute value of z to represent the energy due to stretching. The total energy of 
the chain conformation is given by the sum of Eqs. ([I]) and (Q). We use dimensionless units 
for energy whose typical value is in the range (1 — 2) /c^T; length is measured in units of a 
(=0.38nm), and temperature is measured in units of energy//^. For purposes of mapping 
these to physical values we use 2k gT for energy which means force in our simulations is 
measured in multiples of about 20 pN. 

The thermodynamics as a function of force (/) and temperature (T) is obtained using 
a variant of the multiple histogram method [pjl in conjunction with standard Metropolis 
Monte Carlo simulations [T^] . When / > the collection of histograms at different temper- 



atures and zero force becomes unreliable because highly stretched conformations are almost 
never sampled. Such states, which have negligible Boltzmann weight in the absence of force, 
become thermodynamically important upon stretching. It proves more convenient to collect 
histograms at a fixed value of T and at various values of / so that all relevant states across 
the entire (/, T) plane are adequately sampled. 

To characterize the degree of similarity of an arbitrary sequence conformation with the 
native structure we use the overlap function [|T?|] defined as 

x=i - w-In+i .jzj^-^' (3) 

where is the distance between the beads % and j, is the corresponding distance in the 
native conformation, and S(x) is the Kronecker delta function. 

The kinetic simulations of force-induced unfolding were performed at a constant temper- 
ature T s (below Tp, the folding temperature) that satisfies the condition < \{f — 0,T S ) >= 
0.15. Starting from the native conformation the force was suddenly increased to f s so that 
at (f s ,T s ) stretched rod-like conformations are stable. The unfolding kinetics is monitored 
by computing the distribution of stretch times, t s> u, which is the first instance a trajectory 
i reaches a stretched state with no contacts. Typically M = 800 trajectories have been 
generated for the calculation of unfolding rate. From the distribution of stretch times the 
mean unfolding time and the fraction of folded molecules at time t can be calculated WA 



These probes together with the dynamics of rupture of tertiary contacts and the time de- 
pendence of extension are used to obtain the unfolding pathways. The mean unfolding time 
is t u = -g Y$=i T s,u the inverse of which is the unfolding rate k u . 

In order to obtain the general characteristics of force-induced unfolding we computed the 
phase diagram and kinetics for five sequences (four 27-mers and one 36-mer) and differing 
interaction potentials. We chose four 27-mer and one 36-mer sequences, whose thermody- 
namic and kinetic characteristics in the absence of force are documented elsewhere ||I4| , |T8"|| , 
to investigate unfolding transitions upon stretching. Three of the 27-mer and 36-mer se- 
quences fold kinetically and thermodynamically by two-state mechanism. These sequences 
have small values of a = (Tq — Tp)/Tg, where Tq and Tp are the collapse and folding tran- 



sition temperatures respectively when / = [14,18 1 . The fourth 27-mer sequence with B 
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given by RB potentials has a = 0.11, and its folding (thermodynamics and kinetics) reveals 
intermediates [14]. Thus, with these sequences, we can investigate the effect of stretching 



for protein-like models that display distinct folding mechanisms in the absence of force. For 
purposes of illustration we present results for a 36-mer sequence with the KGS potentials 
(a fa 0) and the 27-mer RB model sequence with a ~ 0.11. 



III. RESULTS 

The phase diagram for the 36-mer, which exhibits two-state cooperative thermal unfold- 
ing when / = 0, is given in Fig. (la). The states of the polypeptide chain are represented 
by the thermal average overlap function < T) >, where x gives the degree of similarity 
to the native state. In particular, small values of the overlap function correspond to confor- 
mations that belong to the native basin of attraction (NBA). The color codes in Fig. (1) 
are such that red corresponds to small < %(/, T) > (high native content and folded states), 
while the blue region has large < x{fi T) > representing unfolded states. We see from Fig. 
(la) that the (/, T) plane divides into predominantly red (folded states) and blue (unfolded 
states) regions. In the red region, the overlap < x{f, T) 0.1 and the probability of being 
in the NBA is greater than 0.5 [IS]. In the blue region < x{f\ T) > is typically greater than 



0.8 and the probability of being in the NBA is almost zero. There is only a narrow band of 
the green region which suggests that the force-induced unfolding transition for this sequence 
is an all-or-none process with no signature of intermediates. 

The sharp boundary between folded and unfolded states resembles that of the type I 
superconductors in the (H, T) plane where H is the applied magnetic field. With this 
analogy the locus of points separating the NBA and the unfolded states is given by 

where f c is the critical force required to unfold the protein, / is the value of f c at T — 0, 
and Tp is the folding transition temperature at zero force. Both f and a depend on the 
sequence and the native state topology. The fit using Eq. (|J) gives f ~ 0.98 and a ~ 6.0 
for the 36-mer. An independent estimate for f can be made by using \Eq\ ~ foAL, where 
Eq is the energy of the native state and AL is the gain in the end-to-end distance of the 
polypeptide chain upon stretching to a fully extended rod state. For the 36-mer Eq = —30.4 
and AL « 30.9 that leads to /„ « 0.98. 

The phase diagram for a sequence whose folding (in the absence of force) involves inter- 
mediates is shown in Fig. (lb). Although the general appearance is similar to that shown 
in Fig. (la) there are clear differences. The region of stability of the NBA (red region) is 
confined to low temperatures and small forces. Secondly, the boundary between the folded 
and unfolded states is fuzzy and contains a broad green region. This suggests that the force 
(or temperature)-induced unfolding is likely to be non-cooperative involving intermediates. 
This is reflected in the force-extension curves which show signatures of intermediates (D. 
Klimov & D. Thirumalai, unpublished). In contrast, for two-state folders with a ~ 0, the 
force-extension curves show that at / ~ /o the chain abruptly unfolds to a stretched confor- 
mation without populating any detectable intermediates. In fact, the unfolding transition 
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occurs in an extremely narrow interval of force which for the 36-mer is O.Ol/o (D. Klimov & 
D. Thirumalai, unpublished). 

We shall point out that the contact interaction energies (or more precisely the potential 
of mean force) are dependent on temperature. It has been argued that the temperature 
dependence of contact interactions has to be included in order to reproduce certain exper- 



imental observations in proteins [ 19| . However, we expect the qualitative features of the 



phase diagrams seen in Fig. (1) will be observed experimentally regardless of the details of 
the interaction energies. 

How is the completely stretched conformation kinetically reached starting from the native 
conformation when / exceeds / c ? For the four sequences with small a (two-state folders in 
the absence of force) we find that, when averaged over an ensemble of initial molecules, 
unfolding occurs in a single kinetic step. However, there is a great variation in the time 
scales of stretching to the rod-like state. This is dramatically illustrated in Fig. (2a) in 
which we plot, for the 36-mer, extension z(t) as a function of t measured in Monte Carlo 
steps (MCS). There is a large unexpected heterogeneity in the approach to the stretched 
state. A striking feature in Fig. (2a) is that there is a large variability in the times taken to 
exhibit significant stretching prior to reaching the rod-like conformation. Global unraveling 
takes place cooperatively with the disruption of local and non-local contacts occurring in an 
all-or-none manner. These features are masked in the ensemble average < z(t) > which is 
shown as a dashed line in Fig. (2a). There is a remarkable similarity between the response of 
these protein-like models to force and that exhibited by flexible polymers subject to sudden 
elongational flow |2(J. Another interesting feature of the unfolding dynamics is that, just as 
in experiments, the force-extension curves show hysteresis during the stretch-release cycles 
(D. Klimov & D. Thirumalai, unpublished). 

The time evolution of the distribution of extension values z for 400 trajectories is plotted 
in Fig. (2b). This plot shows that on time scales less than mean stretch time the chain 
explores a diverse manifold of states each with different z. Certain z values have significantly 
larger probability P(z, t) than the others which suggests that unfolding occurs in a step- 
wise quantized manner. Similar observations have been made using molecular dynamics 
simulations of Ig-like domain ]2l] . 

Despite the large variability in the stretch times t s u the mechanism of approaching the 
rod-like conformation may be qualitatively described as occurring in roughly three stages. 
On the time scale ~ (0.1 — 0.5)r Sj ij following the application of a sudden force there is a 
loss of a number of native contacts. There is a concomitant increase in the extension of the 
chain z(t)/z s ~ (0.1 — 0.5), where z s — N — 1. In the second stage, the sequence searches 
for the equilibrium rod-like conformation. There is great variation in the time scale for this 
search. This stage is characterized by one or several plateaus in z(t) (see Fig. (2a)). Finally, 
the chain explosively and cooperatively makes a transition to the rod state with z(t) / z s ~ 1. 
Naturally, there are several exceptions to this generic scenario. For example, curve (d) in 
Fig. (2a) shows that z(t) reaches its equilibrium value monotonically in an extremely short 
time. 

The dependence of the unfolding mechanisms on topology is illustrated by computing 
the dynamical evolution of all the topologically permissible contacts. We describe the results 
for two 27-mers RB two-state folders labeled A and B (these are the sequences 61 and 63, 
respectively in WW). The native state of each sequence is maximally compact. However, 
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the key topological distinction between them is that in the native conformation for A the 
terminal beads are on the same facet of the cube, whereas for B they are placed directly on 
opposite facets. By tracking the time evolution of the loss of the 156 topological contacts, 
of which 28 are native, we computed the breaking times r 6 fc for contact k. The times r 6 fc are 
determined by (i/N)(N — j)/N which measures how close the contact k, formed between 
beads i and j, is to the sequence ends. For sequence A, we find that the time scales for 
the rupture of contacts are similar for groups of contacts that are close to one or the other 
end of the sequence. In contrast, for sequence B the disruption of contacts from the amino 
terminus (bead 1) occurs fast, while the contacts located near the carboxyl terminus break 
up later in the unfolding process. Thus, topology determines details of the force-induced 
unfolding pathways. Since in Ig-like domain the amino and carboxy termini are at opposite 
ends we expect that the underlying mechanism by which this domain unfolds may be similar 
to that for sequence B. 

In Fig. (3a) we present the force-induced unfolding rate k u as a function of / for the 
36-mer at T s = 0.49. Qualitatively similar results were obtained for the 27-mers as well. The 
unfolding rates were computed from the distribution of stretch times for 800 trajectories. 
It is expected that k u should increase upon increasing / because the activation free energy 
is lowered upon application of force [22 , 23 ]. The free energy profiles as a function of the 



number of native contacts (Q), which is an approximate reaction coordinate for two-state 



folders [24|, are given in Fig. (3b) for the 36-mer at various force values. The decrease in the 
unfolding barrier explains the observed dependence of k u on / for / < f opt . The unfolding 
rate is well described by k u ~ k exp(/ Ax / ksT) for / < f opt which is given by 

f opt ~AF\T s ,f = 0)/Ax, (5) 

where AF*(T S , f = 0) is the unfolding free energy barrier at T = T s and zero force, Ax is an 
approximate width of the unfolding potential, and ko(T s ) is the unfolding rate in the absence 
of force. For the 36-mer AF$ is 2.26 at T s = 0.49 (see Fig. (3b)), f opt ~ 3.2 and therefore 
Ax ~ 0.02L where L is the contour length of the chain. The small value of Ax implies that 
the transition region is quite narrow. When / > f opt the unfolding rate starts to decrease 
because sudden (corresponding to large pulling speed) application of relatively large forces 
traps the polypeptide chain in conformations, whose unfolding requires transient shortening 
of the end-to-end distance. The transition from such conformations, which requires local 
annealing of the chain, slows down the unfolding process. For / > 7 (see Fig. (3a)) there 
is a free energy barrier associated with the breakup of contacts in the conformations acting 
as transient kinetic traps. In this force regime the fraction of molecules that are folded at a 
time t is best fit using a sum of two exponentials with the slow phase signaling the onset of 
local trapping. 

Our results for unfolding triggered by force are consistent with a number of experimental 
observations on the unraveling of isolated Ig-like domains in titin fi]-|6]||. (1) The ratio 
of fc/fo for Ig-like domains can be computed using Eq. (H) with T = 25°C, the folding 



temperature Tp m Q0°C [25], and a ~ 6.0. This gives f c /fo ~ 0.49. From the phase diagram 
in Fig. (la) we obtain a similar value for the 36-mer when the temperature (measured in 
Kelvin) is approximately 0.89Tp. Thus the general shape of the phase boundary should be 
useful in calibrating the experimental measurements on proteins. (2) The typical values of 
the threshold force fo required to induce stretching in the two-state folders are in the range 
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of (1 — 2.5). By translating these into physical units we obtain / ~ (20 — 50) pN . Using 
this we would predict that the unfolding force f c , which depends on the pulling speed, for 
Ig-like domains in titin should be around (10 — 25) pN. These values are not inconsistent 
with the experimental measurements (see Fig. (5) of 0). (3) The width of the unfolding 
potential Ax which is obtained using Eq. (|5|) and the computed values of f opt and the 
activation free energy AF$ (see Fig. (3b)) in the absence of force for the 36-mer is 0.02L. 
Rief et al. || estimated that Ax/L « 0.01 using Ax = 0.3 ram and L = 31 ram. If we use 
the fact that the lattice constant a, which gives the distance between a-carbon atoms, is 
~ 0.4 ram then the value of Ax for the 36-mer in physical units is roughly 0.3 ram. These 
numbers are in very good accord with the experiments. 

The theoretical findings can be used to map the underlying folding free energy landscape 
for two-state proteins using data from force-induced unfolding experiments. This is illus- 
trated by applying our results for the 36-mer to Ig-like domains. An estimate of f opt for Ig 
domain can be made using the value of 3.2 for the 36-mer, and by assuming that a f opt scales 
linearly with N. For the 90 residue Ig-like domain we find that f opt ~ 160 pN. The unfold- 
ing barrier is f op tAx which is approximately 12/c^T assuming that Ax = 0.3 ram 0. From 
the stability of Ig domain (AG ^2.6 kcal/mol [|K|) we predict that the refolding barrier is 
approximately 4.6 kcal/mol. From these the folding and unfolding times are predicted to 
be 0.1s and 400 s respectively. These predictions are in fairly reasonable agreement with 
experimental estimates |K| . The estimates of barriers to folding by force- induced unfolding 
measurements are likely to be complement to the standard method of measuring rates at 
finite denaturant concentration and then extrapolating to the desired values in the absence 
of denaturants. 



IV. CONCLUSIONS 

This study has led to the following predictions: (i) The unfolding time scales should 
decrease exponentially with force only till an optimum value of force, whose magnitude is 
determined by only the unfolding barrier in the absence of force, (ii) The phase diagram, 
especially the boundary separating the unfolded and folded states, has the characteristic 
type seen in Fig. (1) and is quantitatively given by Eq. (|J). (iii) The nature of force- 
induced unfolding depends on the proximity of the amino and carboxy termini in the native 
state. These predictions are all amenable to experimental test. Because the response to force 
depends sensitively on the characteristics of the sequence when / is zero, it follows that the 
mechanisms of protein folding (presence of intermediates, the nature of the transition states, 
and barriers to folding) may be very directly probed by single molecule force spectroscopy. 
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FIGURES 



Fig. 1. Phase diagrams in the (/, T) plane for (a) the 36-mer two-state KGS sequence and 
(b) 27-mer moderate folding RB sequence. The color code for < x{f\ T) > is given on the 
right. The red color corresponds to the states in the NBA, whereas blue color indicates the 
unfolded states. Green areas correspond to intermediate partially folded states. For both 
the sequences the boundary of NBA is given by Eq. (|J) . 

Fig. 2. (a) The time dependence of extension z(t)/z s for 201 individual trajectories of 
36-mer sequence at T s = 0.49 and f s = 4.0. Three generic trajectories (curves (a)-(c)), the 
fastest (d), and the slowest (e) trajectories are shown in black. The rest are given in grey. 
The average < z(t) > obtained from 400 individual trajectories is represented by a thick 
dashed curve. 

(b) The probability distribution P(z/z s ,t) for 36-mer sequence under the same unfolding 
conditions as in Fig. (2a). P(z/z s ,t) gives the kinetic probability of occurrence of the 
extension value z at the time t. In both plots the extension values are normalized by 
z s = N - 1. 

Fig. 3. (a) The dependence of unfolding rate k u (filled circles) on / for the 36-mer sequence 
at T s = 0.49. For values of / less than 7.0 the force-induced unfolding takes place by a two- 
state process. For / > 7.0 unfolding trajectories separate into a fast pool (the fraction 
of which is $) that reaches the stretch state with the rate 1/t S} fast (open circles) and a 
slow pool trajectories characterized by the rate 1/t S) slow (open squares). At these / the 
partition factor $ < 1. 

(b) The free energy F(Q)/T S for 36-mer sequence at T s = 0.49 and / = 0.0 (diamonds), 
/ = 0.3 (full circles), / = 0.54 (open circles), and / = 0.7 (open squares). It is seen that 
the free energy unfolding barrier AF^/T S decreases as / becomes stronger: AF$/T S = 4.6 
at / = 0.0, 4.0 at / = 0.3, 3.3 at / = 0.54, and 2.8 at / = 0.7. The free energy of stability 
(in units of T 8 ) at / = 0.0 is 2.0. 
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